Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.0] integrate reproducible build with CI Build & Test workflow #1710

Merged
merged 6 commits into from
Oct 24, 2023

Conversation

spoonincode
Copy link
Member

@spoonincode spoonincode commented Oct 3, 2023

Add the reproducible pinned build in to the main Build&Test Workflow (to be run on every PR etc). This build is then tested on both Ubuntu20 and Ubuntu22. The leap.deb file is stored as an artifact: this will be the binary file we release as a release asset.

The most shocking (and.. good) aspect of this workflow is that there is no problem taking the builddir from the Debian10+cmake3.27 environment, transporting it to a Ubuntu20+cmake3.16 & Ubuntu22+cmake3.22 environment, and all the tests continue to work correctly. I expected the difference in cmake versions at configure vs ctest time would cause some sort of problem which would necessitate the creation of a Ubuntu20+cmake3.27 & Ubuntu22+cmake3.27 platform to run ctest on instead. But this doesn't seem needed. We could still opt to take this approach if we wanted to be conservative and not mix up cmake/ctest versions.

Something I want to be mindful of is commonality between what CI runs to do a reproducible build, and what an individual user performs to produce a reproducible build. What I want to avoid is a 2.0 type approach where CI & users nominally run 100% completely different scripts to perform the build. Certainly, ideally, the exact same steps are run in both cases. Unfortunately this doesn't quite get us to the perfect ideal. The good news is that both users and CI operate off the builder target in tools/reproducible.Dockerfile. But the final step in building Leap is a little different. Users will ultimately run the build target in tools/reproducible.Dockerfile,

RUN cmake -S src -B build -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_BUILD_TYPE=Release -GNinja && \
cmake --build build -t package -- ${LEAP_BUILD_JOBS:+-j$LEAP_BUILD_JOBS} && \
src/tools/tweak-deb.sh build/leap_*.deb

But CI runs,
cmake -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr -DENABLE_LEAP_DEV_DEB=On -GNinja
cmake --build build

cpack
../tools/tweak-deb.sh leap_*.deb

It's important these two methods remain functionally identical for the produced .deb file to ensure that both CI & manual user builds produce the exact same output.

work on #1641

@spoonincode spoonincode added the CICD Anything dealing with the CI workflow behavior label Oct 3, 2023
@spoonincode spoonincode linked an issue Oct 3, 2023 that may be closed by this pull request
@oschwaldp-oci
Copy link
Contributor

oschwaldp-oci commented Oct 3, 2023

If running the performance_harness workflow with reproducible selected and trying to override the leap version to a prior version, say 4.0 the name of the file gets messed up now due to the platform name not existing for previous versions. Will those be back-filled somehow?

See: https://github.com/AntelopeIO/leap/actions/runs/6398790634/job/17369675575

  with:
    owner: AntelopeIO
    repo: leap
    target: 4.0
    prereleases: false
    file: leap.*reproducible.*(x86_64|amd64).deb
    token: ***
    fail-on-missing-target: true
/usr/bin/docker exec  8ac70cb786f0d4f6e73f3f395[2](https://github.com/AntelopeIO/leap/actions/runs/6398790634/job/17369675575#step:5:2)074d77f8c[3](https://github.com/AntelopeIO/leap/actions/runs/6398790634/job/17369675575#step:5:3)eaf15d11d3b9c1933b12a5c3fbe1 sh -c "cat /etc/*release | grep ^ID"
Error: No matching file found in resolved relrease v[4](https://github.com/AntelopeIO/leap/actions/runs/6398790634/job/17369675575#step:5:4).0.4

Similar failure seems to happen with the ph_backward_compatibility workflow:
See: https://github.com/AntelopeIO/leap/actions/runs/6398755146/job/17369769475#step:6:12

  with:
    owner: AntelopeIO
    repo: leap
    file: (leap).*reproducible.04.*(x86_64|amd64).deb
    target: 3.[2](https://github.com/AntelopeIO/leap/actions/runs/6398755146/job/17369769475#step:6:2)
    prereleases: false
    token: ***
    fail-on-missing-target: true
/usr/bin/docker exec  2[3](https://github.com/AntelopeIO/leap/actions/runs/6398755146/job/17369769475#step:6:3)b[4](https://github.com/AntelopeIO/leap/actions/runs/6398755146/job/17369769475#step:6:4)1cb049[5](https://github.com/AntelopeIO/leap/actions/runs/6398755146/job/17369769475#step:6:5)3f991c1a8d153fd0d[6](https://github.com/AntelopeIO/leap/actions/runs/6398755146/job/17369769475#step:6:6)48[7](https://github.com/AntelopeIO/leap/actions/runs/6398755146/job/17369769475#step:6:7)d3b106b91303bfc1[8](https://github.com/AntelopeIO/leap/actions/runs/6398755146/job/17369769475#step:6:8)b042[9](https://github.com/AntelopeIO/leap/actions/runs/6398755146/job/17369769475#step:6:9)0d1fc7499c sh -c "cat /etc/*release | grep ^ID"
Error: No matching file found in resolved relrease v3.2.4

@spoonincode
Copy link
Member Author

For PH workflow, it seems there will be a problem where over the course of time newer platforms will become available that aren't previously supported. If it wasn't for the reproducible platform in this case, we'd have the same problem with a ubuntu24 platform. But this reproducible case adds the extra complication though that once we release 5.0 the leap.deb filenames will be different between 3.1-4.0 & 5.0 since for prior versions, I'm not really sure best how to solve that.. other then maybe something like

      - if: needs.v.outputs.leap-target != 'DEFAULT' && (startsWith(needs.v.outputs.leap-target, "3") || startsWith(needs.v.outputs.leap-target, "4")) 
        name: Download Prev Leap Version
        uses: AntelopeIO/asset-artifact-download-action@v3
        with:
          owner: AntelopeIO
          repo: leap
          target: '${{needs.v.outputs.leap-target}}'
          prereleases: ${{fromJSON(needs.v.outputs.leap-prerelease)}}
          file: 'leap.*${{github.event.inputs.platform-choice}}.*(x86_64|amd64).deb'
     - if: needs.v.outputs.leap-target != 'DEFAULT' && !(startsWith(needs.v.outputs.leap-target, "3") || startsWith(needs.v.outputs.leap-target, "4")) 
        name: Download Prev Leap Version
        uses: AntelopeIO/asset-artifact-download-action@v3
        with:
          owner: AntelopeIO
          repo: leap
          target: '${{needs.v.outputs.leap-target}}'
          prereleases: ${{fromJSON(needs.v.outputs.leap-prerelease)}}
          file: 'leap.*amd64.deb'

I guess PHBC is somewhat similar. Some particular platform may only be supported over a subset of all releases.

But I think there is another tricky nuance here for these workflows too, including for 3.x & 4.0.. when we build ubuntu20 or ubuntu22 we're building unpinned, but when we fetch a release from the release assets we're getting a pinned build. That means some of these comparisions may not be 1:1

Ultimately I'm not really sure what to do here..

@@ -290,7 +314,7 @@ jobs:

all-passing:
name: All Required Tests Passed
needs: [dev-package, tests, np-tests, libtester-tests]
needs: [tests, np-tests, libtester-tests]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed dev-package here since libtester-tests job(s) already depend on it.

- cfg: {name: 'ubuntu20', base: 'ubuntu20', builddir: 'ubuntu20'}
- cfg: {name: 'ubuntu22', base: 'ubuntu22', builddir: 'ubuntu22'}
- cfg: {name: 'ubuntu20repro', base: 'ubuntu20', builddir: 'reproducible'}
- cfg: {name: 'ubuntu22repro', base: 'ubuntu22', builddir: 'reproducible'}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not what I was expecting this to end up looking like; I was thinking there would be some sort of JSON file or such that defined the platform hierarchy for building and then testing. But simply ran out of time to explore other approaches. Though, honestly, other then copy pasting it 3 times this isn't too bad. But this is exactly the kind of plumbing this comment chain is discussing some, #1703 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With care, this kind of copy/paste is OK, avoiding complicating logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to figure out a good way to avoid it, but that can come later. The problem I'm having is that any approach to avoid this repetition seems to introduce significant indirection.

For example, maybe I'd have a JSON or YAML file as part of the repo define the relationships between the platforms we're building, what that gets tested on, what runners are used (in case we ever add ARM etc), whether the libtester tests should run (on anything but ubuntu, no), etc. But that would involve a non-trivial chunk of, I guess, javascript at the start of the job to 'unpack' and process that definition in to various rules used in the workflow: complicating indeed.

@bhazzard
Copy link

bhazzard commented Oct 4, 2023

Tracked by: #1717

@spoonincode spoonincode linked an issue Oct 4, 2023 that may be closed by this pull request
Base automatically changed from repro to main October 4, 2023 18:40
cmake --build build -t package -- ${LEAP_BUILD_JOBS:+-j$LEAP_BUILD_JOBS} && \
src/tools/tweak-deb.sh build/leap_*.deb
/__w/leap/leap/tools/tweak-deb.sh build/leap_*.deb
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in the perfect example for failure of,

It's important these two methods remain functionally identical for the produced .deb file to ensure that both CI & manual user builds produce the exact same output

This change was required for the above statement to remain true because in some cases full paths get baked in to the final executable. When built via build-base.yaml the source path is effectively /__w/leap/leap/ while the reproducible.Dockerfile was /src/. I entertained 3 ways to fix this,

  1. Change CI to place source in /src. This involves changes in many locations because we actions/checkout@v3 in many different jobs during the build & tests.
  2. Change CI to make use of something like -ffile-prefix-map=$PWD=/src. This seems to work, but results in more variation of compile flags between CI & manual build.
  3. Change reproducible.Dockerfile's build stage to place source in /__w/leap/leap like CI does.

Given the 11th hour, I opted for option 3 even though it has a negative trade off that CI only places source in /__w/leap/leap if the repository is named leap. If someone were to fork the repo to a different name, they would have divergence between CI & manual reproducible builds. But our current CI workflows are, unfortunately, ENF centric (see #416 for how this could be resolved; though I doubt something like the NP Tests will pass on a free GH runner these days), so this seems like an okay tradeoff for the time being. We can migrate to option 2 in the future is really needed.

@oschwaldp-oci
Copy link
Contributor

For PH workflow, it seems there will be a problem where over the course of time newer platforms will become available that aren't previously supported. If it wasn't for the reproducible platform in this case, we'd have the same problem with a ubuntu24 platform. But this reproducible case adds the extra complication though that once we release 5.0 the leap.deb filenames will be different between 3.1-4.0 & 5.0 since for prior versions, I'm not really sure best how to solve that.. other then maybe something like

      - if: needs.v.outputs.leap-target != 'DEFAULT' && (startsWith(needs.v.outputs.leap-target, "3") || startsWith(needs.v.outputs.leap-target, "4")) 
        name: Download Prev Leap Version
        uses: AntelopeIO/asset-artifact-download-action@v3
        with:
          owner: AntelopeIO
          repo: leap
          target: '${{needs.v.outputs.leap-target}}'
          prereleases: ${{fromJSON(needs.v.outputs.leap-prerelease)}}
          file: 'leap.*${{github.event.inputs.platform-choice}}.*(x86_64|amd64).deb'
     - if: needs.v.outputs.leap-target != 'DEFAULT' && !(startsWith(needs.v.outputs.leap-target, "3") || startsWith(needs.v.outputs.leap-target, "4")) 
        name: Download Prev Leap Version
        uses: AntelopeIO/asset-artifact-download-action@v3
        with:
          owner: AntelopeIO
          repo: leap
          target: '${{needs.v.outputs.leap-target}}'
          prereleases: ${{fromJSON(needs.v.outputs.leap-prerelease)}}
          file: 'leap.*amd64.deb'

I guess PHBC is somewhat similar. Some particular platform may only be supported over a subset of all releases.

But I think there is another tricky nuance here for these workflows too, including for 3.x & 4.0.. when we build ubuntu20 or ubuntu22 we're building unpinned, but when we fetch a release from the release assets we're getting a pinned build. That means some of these comparisions may not be 1:1

Ultimately I'm not really sure what to do here..

I think I'm struggling a little with the definition of reproducible as a platform from the viewpoint of how the workflows have been constructed. Even say, in the new matrix configuration it is a combination:

  • cfg: {name: 'ubuntu22repro', base: 'ubuntu22', builddir: 'reproducible'}

Where I realistically think of the platform being used to build up the build or test environment as the base: 'ubuntu22' portion.

Maybe there needs to be a distinction between the base platform to build up from and whether you're using reproducible in that context?

@spoonincode
Copy link
Member Author

I am not sure I understand, it seems like maybe the confusion is because some of the platforms are so overloaded? For example, the ubuntu20 platform ends up,

  • building leap on ubuntu 20 ("unpinned" build)
  • testing the ubuntu 20 unpinned leap build on ubuntu 20
  • testing the reproducible leap build on ubuntu 20

which makes me wonder if we should overload them less, and have different ubuntu20-build and ubuntu20-test platform? But that still doesn't solve the problem that the reproducible build needs to be tested on multiple platforms that are completely different than the platform that built it; and the problem that we need some way to define these relationships in CI somehow (cfg thing at the moment).

@oschwaldp-oci
Copy link
Contributor

I am not sure I understand, it seems like maybe the confusion is because some of the platforms are so overloaded? For example, the ubuntu20 platform ends up,

  • building leap on ubuntu 20 ("unpinned" build)
  • testing the ubuntu 20 unpinned leap build on ubuntu 20
  • testing the reproducible leap build on ubuntu 20

which makes me wonder if we should overload them less, and have different ubuntu20-build and ubuntu20-test platform? But that still doesn't solve the problem that the reproducible build needs to be tested on multiple platforms that are completely different than the platform that built it; and the problem that we need some way to define these relationships in CI somehow (cfg thing at the moment).

That is kind of what I was getting at. In the matrix we have the option of defining things a little better like you did in the cfg object or you can break out multiple fields platform, other, etc to use although unlike with the cfg approach those would go into the multi-dimensional matrix then and create a scenario that is run. What I was getting at is that the platform in the past has always seemed to define what operating system is the basis, regardless of what you are doing (building, testing, etc.). So whenever ubuntu22 is seen as the platform and included in the naming of artifacts, it is clear where that came from and what it can be used for (i.e. If it is a .deb pkg, which os it is expected to install properly on). You don't get that same understanding with a platform of reproducible, especially as that reproducible build is currently installable on multiple platforms and it does not stand alone -- from the simple name reproducible one doesn't directly know what the os requirement is for instance. It would seem that the platform for the reproducible build is actually debian (debian:buster) as defined in the Dockerfile, which then might hint at which Ubuntu flavors you could install it on. It just seems that platform is being overloaded with new meaning in this PR which doesn't translate to the other workflows expectations/understanding.

@oschwaldp-oci
Copy link
Contributor

From the PR description:

It's important these two methods remain functionally identical for the produced .deb file to ensure that both CI & manual user builds produce the exact same output.

Is there a way to codify this or test for it such that this one comment is not lost to history in the PR description and forgotten?

@spoonincode
Copy link
Member Author

From the PR description:

It's important these two methods remain functionally identical for the produced .deb file to ensure that both CI & manual user builds produce the exact same output.

Is there a way to codify this or test for it such that this one comment is not lost to history in the PR description and forgotten?

The first hunch would be to add a script that is used in both locations to ensure they remain the same. But I find it rather unfortunate to add a layer of indirection for 3 or 4 lines. But there are some slight differences between the two, such as the manual user build (reproducible.Dockerfile) directly building the packages, where CI delays that until a further step. And secondly, CI does -DENABLE_LEAP_DEV_DEB=On where the manual user build does not. So such a script wouldn't simply be a layer of indirection but also require parameters etc. which makes it even less palpable imo.

Fortunately if they diverge the thought is we'll discover it via our code signing procedures: when the dev team goes to reproduce the build locally to sign it, we'll realize the CI and local builds don't match up. Now.. we haven't created this procedure yet, so I don't know exactly the steps -- it might even be manual verification; and it might be annoying late (right when we are about to do a release we discover a problem). Maybe not a good answer but it's something.

You don't get that same understanding with a platform of reproducible, especially as that reproducible build is currently installable on multiple platforms and it does not stand alone -- from the simple name reproducible one doesn't directly know what the os requirement is for instance.

Yeah but that's the feature. The output from the reproducible build can run on many distros, and the 'reproducible platform' builds up the environment to accomplish that. I guess renaming the reproducible platform to something like reproducible-build-runson-ubuntu20-or-later would be a more accurate description, but is that sort of mouthful helpful? It's also more troublesome to change if (when) we change the minimum requirement in the future. And regardless of the name we still need plumbing to know that the build output from the reproducible platform needs to be tested on both the ubuntu20 and ubuntu22 platforms.

It just seems that platform is being overloaded with new meaning in this PR which doesn't translate to the other workflows expectations/understanding.

What should the meaning of a platform file be? If we separate out the platforms to have a single role (even if that means some duplication and/or very simple platform files) does it become an improvement?

@BenjaminGormanPMP BenjaminGormanPMP removed the request for review from oschwaldp-oci October 12, 2023 21:11
@oschwaldp-oci
Copy link
Contributor

oschwaldp-oci commented Oct 13, 2023

Just as a follow-up to your suggestion above about a way to solve some of the issue in the near term, I used this over in another PR for a workflow requiring leap package install (there are a couple tweaks from what you posted, particularly in the second if clause):

      - if: needs.versions.outputs.leap-target != 'DEFAULT' && (startsWith(needs.versions.outputs.leap-target, 3) || startsWith(needs.versions.outputs.leap-target, 4)) 
        name: Download leap binary
        uses: AntelopeIO/asset-artifact-download-action@v3
        with:
          owner: AntelopeIO
          repo: leap
          target: '${{needs.versions.outputs.leap-target}}'
          prereleases: ${{fromJSON(needs.versions.outputs.leap-prerelease)}}
          file: 'leap.*${{matrix.platform}}.*(x86_64|amd64).deb'
          token: ${{ secrets.GITHUB_TOKEN }}
      - if: needs.versions.outputs.leap-target == 'DEFAULT' || !(startsWith(needs.versions.outputs.leap-target, 3) && startsWith(needs.versions.outputs.leap-target, 4)) 
        name: Download Prev Leap Version
        uses: AntelopeIO/asset-artifact-download-action@v3
        with:
          owner: AntelopeIO
          repo: leap
          target: '${{needs.versions.outputs.leap-target}}'
          prereleases: ${{fromJSON(needs.versions.outputs.leap-prerelease)}}
          file: 'leap.*amd64.deb'
          

It might be nice to find a way to have less code duplication by holding the file name in a variable that we could default to the new file: 'leap.*amd64.deb' and only overwrite that if the version was < 5 say up in the versions|v job. That way it would be something like file: ${{needs.versions.outputs.leap-file-regex}} or something and only need one asset-artifact-download-action@v3 block/step.

@oschwaldp-oci
Copy link
Contributor

From the PR description:

It's important these two methods remain functionally identical for the produced .deb file to ensure that both CI & manual user builds produce the exact same output.

Is there a way to codify this or test for it such that this one comment is not lost to history in the PR description and forgotten?

The first hunch would be to add a script that is used in both locations to ensure they remain the same. But I find it rather unfortunate to add a layer of indirection for 3 or 4 lines. But there are some slight differences between the two, such as the manual user build (reproducible.Dockerfile) directly building the packages, where CI delays that until a further step. And secondly, CI does -DENABLE_LEAP_DEV_DEB=On where the manual user build does not. So such a script wouldn't simply be a layer of indirection but also require parameters etc. which makes it even less palpable imo.

Fortunately if they diverge the thought is we'll discover it via our code signing procedures: when the dev team goes to reproduce the build locally to sign it, we'll realize the CI and local builds don't match up. Now.. we haven't created this procedure yet, so I don't know exactly the steps -- it might even be manual verification; and it might be annoying late (right when we are about to do a release we discover a problem). Maybe not a good answer but it's something.

The script idea jumped out initially to me as well, but I think you pointed out the same issues with that as came to mind when I was debating it. Like you pointed out, maybe it isn't as big of an issue as it may seem initially, especially if engineers will most probably catch any differences during the signing procedure.

You don't get that same understanding with a platform of reproducible, especially as that reproducible build is currently installable on multiple platforms and it does not stand alone -- from the simple name reproducible one doesn't directly know what the os requirement is for instance.

Yeah but that's the feature. The output from the reproducible build can run on many distros, and the 'reproducible platform' builds up the environment to accomplish that. I guess renaming the reproducible platform to something like reproducible-build-runson-ubuntu20-or-later would be a more accurate description, but is that sort of mouthful helpful? It's also more troublesome to change if (when) we change the minimum requirement in the future. And regardless of the name we still need plumbing to know that the build output from the reproducible platform needs to be tested on both the ubuntu20 and ubuntu22 platforms.

It just seems that platform is being overloaded with new meaning in this PR which doesn't translate to the other workflows expectations/understanding.

What should the meaning of a platform file be? If we separate out the platforms to have a single role (even if that means some duplication and/or very simple platform files) does it become an improvement?

So for the two comment/response chains above -- I think reproducible is fine as a platform especially as referring to it in terms of the fact that it is represented by a specific Dockerfile platform description. I guess where it starts to get a little hairy in my mind is when we start to use that as a basis instead of something that ports to another workflow. So if you take the output of the reproducible build and then want to use it in another workflow or job, you can use it in many os/platform situations ubuntu20, ubuntu22, reproducible, etc. However, how does that translate when the reproducible becomes the base platform and you want to install other packages onto it and we only build them for ubuntu20?

I think we have a solution of getting around the naming conventions for downloading the correct artifacts, as we've discussed above. Should work for now and as things evolve, so can the solution.

I am still curious though about how the reproducible platform fits in with our other base platforms ubuntu20/ubuntu22 for instance when having to be built on top of with other package we develop.

@heifner heifner removed this from the Leap v5.0.0-stable milestone Oct 17, 2023
@bhazzard
Copy link

bhazzard commented Oct 19, 2023

To solve the obvious blockers, lets:

@spoonincode spoonincode changed the base branch from main to release/5.0 October 23, 2023 14:21
@spoonincode spoonincode changed the title integrate reproducible build with CI Build & Test workflow [5.0] integrate reproducible build with CI Build & Test workflow Oct 23, 2023
Copy link
Member

@linh2931 linh2931 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

- cfg: {name: 'ubuntu20', base: 'ubuntu20', builddir: 'ubuntu20'}
- cfg: {name: 'ubuntu22', base: 'ubuntu22', builddir: 'ubuntu22'}
- cfg: {name: 'ubuntu20repro', base: 'ubuntu20', builddir: 'reproducible'}
- cfg: {name: 'ubuntu22repro', base: 'ubuntu22', builddir: 'reproducible'}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With care, this kind of copy/paste is OK, avoiding complicating logic.

@spoonincode spoonincode merged commit 0501a7e into release/5.0 Oct 24, 2023
29 checks passed
@spoonincode spoonincode deleted the repro_ci branch October 24, 2023 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD Anything dealing with the CI workflow behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate reproducible build with CI Build & Test workflow
5 participants